Introduction

Project Aim

To determine how accurately expert wine quality ratings can be predicted using a set of easily measured chemical components.

Data

Looking At the Red Wine Data

Looking At the White Wine Data

Correlations- Red Wine

Correlations- White Wine

Methods

Methods: Linear Regression

Methods: Partial Proportional Odds Models

Three different approaches were considered:

Methods: Multinomial Regression

Methods: Random Forest

Variable Selection

Model Evaluation

Models were compared on the following metrics:

Results: Linear Model

Comparison of Full and Reduced Linear Regression Models for Red and White Wine Quality
Overall Results
Percent Correct by Category
Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Full (Red) 57.73 0.2998 0.4996 0 0 67.65 64.57 23.08 0 NA
Reduced (Red) 56.15 0.2728 0.4545 0 0 65.44 63.78 20.51 0 NA
Full (White) 52.61 0.2162 0.4211 0 0 39.86 81.78 22.16 0 0
Reduced (White) 51.12 0.1969 0.4067 0 0 41.58 78.13 20.45 0 0

Results: Linear Model (Red Wine)

Results: Linear Model (White Wine)

Results: Partial Proportional Odds Model (White Wine)

Results: Proportional Odds Model (White Wine)

Results: Partial Proportional Odds Model (Red Wine)

Results: Comparison of Partial Proportional Odds Models (Red Wine)

Comparison of Partial Proportional Odds Models for Red Wine Quality
Model Accuracy Kappa Weighted Kappa
Proportional Odds 58.0442 0.3065 0.4707
Partial Proportional Odds 58.3596 0.3117 0.4742
Non-Proportional Odds 57.7287 0.3080 0.4357

Results: Proportional Odds Model (Red Wine)

Results: Partial Proportional Odds Model (Red Wine)

Results: Non-Proportional Odds Model (Red Wine)

Results: Multinomial Regression (Red Wine Quality Classification)

Comparison of Multinomial Regression Models for Red Wine Quality
Model Accuracy Kappa Weighted Kappa 3 4 5 6 7 8
Full Model (Linear Terms) 58.3596 0.3158 0.5227 0 20 74.2647 55.9055 28.2051 0
Reduced Model (Linear Terms) 58.3596 0.3193 0.4952 0 10 72.7941 56.6929 33.3333 0
Reduced Model (Second Order Terms) 55.2050 0.2702 0.4789 0 0 67.6471 55.1181 33.3333 0

Results: Multinomial Regression - Full Model Confusion Matrix (Red Wine)

Results: Multinomial Regression (White Wine Quality Classification)

Comparison of Multinomial Regression Models for Red Wine Quality
Model Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Full Model (Linear Terms) 54.0900 0.2451 0.4121 0 9.375 51.2027 79.4989 15.9091 0 0
Reduced Model (Linear Terms) 51.9427 0.2201 0.4093 0 3.125 54.6392 72.6651 16.4773 0 0
Reduced Model (Second Order Terms) 53.2720 0.2396 0.4010 0 3.125 54.2955 74.4875 19.8864 0 0

Results: Multinomial Regression - Full Model Confusion Matrix (White Wine)

Results: Random Forest

Random Forest Results for Red and White Wine
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Red Wine 70.98 0.5263 0.6168 0 0 83.82 70.87 53.85 0.00 NA
White Wine 67.28 0.4862 0.6542 0 25 67.35 80.87 47.73 42.86 0

Results: Random Forest (Variable Importance)

Results: Random Forest (Red Wine)

Results: Random Forest (White Wine)

Comparison of Results: Red Wine

Comparison of Results for Red Wine
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8
Random Forest 70.9800 0.5263 0.6168 0 0 83.8200 70.8700 53.8500 0
Proportional Odds 58.0442 0.3065 0.4707 0 0 72.0588 59.8425 25.6410 0
Partial Proportional Odds 58.3596 0.3117 0.4742 0 0 72.0588 59.8425 28.2051 0
Non-Proportional Odds 57.7287 0.3080 0.4357 0 0 72.0588 56.6929 33.3333 0
Multinomial 58.3596 0.3158 0.5227 0 20 74.2647 55.9055 28.2051 0
Linear Regression 56.1500 0.2728 0.4545 0 0 65.4400 63.7800 20.5100 0

Comparison of Results: White Wine

Comparison of Results for White Wine
Overall Results
Percent Correct by Category
Prediction Accuracy Kappa Weighted Kappa 3 4 5 6 7 8 9
Random Forest 67.2800 0.4862 0.6542 0 25.000 67.3500 80.8700 47.7300 42.86 0
Proportional Odds 51.7382 0.2108 0.3993 0 0.000 51.2027 74.9431 15.9091 0.00 0
Multinomial 54.0900 0.2451 0.4121 0 9.375 51.2027 79.4989 15.9091 0.00 0
Linear Regression 51.1200 0.1969 0.4067 0 0.000 41.5800 78.1300 20.4500 0.00 0

Discussion: Random Forest

Discussion: Likelihood Based Approaches

Discussion: Limitations and Future Directions

Bottom Line

Expert wine quality ratings can be predicted reasonably well using chemical components, but true wine connoisseurs are still better off consulting a sommelier.

References

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-74.